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Abstract 

Background: The prevalence of infection with the three common soil-transmitted helminths (i.e. Ascaris 
lumbricoides, Trichuris trichiura, and hookworm) in Bolivia is among the highest in Latin America. However, the 
spatial distribution and burden of soil-transmitted helminthiasis are poorly documented. 

Methods: We analysed historical survey data using Bayesian geostatistical models to identify determinants of the 
distribution of soil-transmitted helminth infections, predict the geographical distribution of infection risk, and assess 
treatment needs and costs in the frame of preventive chemotherapy. Rigorous geostatistical variable selection 
identified the most important predictors of A lumbricoides, T. trichiura, and hookworm transmission. 

Results: Results show that precipitation during the wettest quarter above 400 mm favours the distribution of 
A lumbricoides. Altitude has a negative effect on T. trichiura. Hookworm is sensitive to temperature during the 
coldest month. We estimate that 38.0%, 19.3%, and 1 1.4% of the Bolivian population is infected with A lumbricoides, 
T. trichiura, and hookworm, respectively. Assuming independence of the three infections, 48.4% of the population is 
infected with any soil-transmitted helminth. Empirical-based estimates, according to treatment recommendations 
by the World Health Organization, suggest a total of 2.9 million annualised treatments for the control of 
soil-transmitted helminthiasis in Bolivia. 

Conclusions: We provide estimates of soil-transmitted helminth infections in Bolivia based on high-resolution 
spatial prediction and an innovative variable selection approach. However, the scarcity of the data suggests that a 
national survey is required for more accurate mapping that will govern spatial targeting of soil-transmitted 
helminthiasis control. 
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Background 

Soil-transmitted helminth infections are mainly caused 
by the intestinal worms Ascaris lumbricoides, Trichuris 
trichiura, and the two hookworm species Ancylostoma 
duodenale and Necator americanus [1]. They are the 
most prevalent neglected tropical diseases, and they are 
widely distributed across Latin America [2,3]. Soil- 
transmitted helminthiasis and other neglected tropical 
diseases primarily affect low-income populations, causing 
chronic conditions, learning disabilities, and reduced 
productivity and income earning capacity in later life. 
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Morbidity control and, where resources allow, local elim- 
ination are now recognised as a priority for achieving the 
millennium development goals [4], In 2009, the Pan 
American Health Organization (PAHO) developed a plan 
to eliminate neglected and other poverty-related diseases 
in Latin America and Caribbean countries. Soil-transmitted 
helminthiases were identified as target diseases to 
be controlled through preventive chemotherapy and by 
promoting access to clean water, improved sanitation, and 
better hygiene behaviour [5]. Control programmes require 
reliable baseline information of the geographical distribu- 
tion of the number of infected people and disease burden 
estimates in order to enhance the spatial targeting and 
cost-effectiveness of planned interventions [6,7]. 
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Bolivia is ranked last among the Western Hemisphere 
countries in terms of key health indicators. For example, 
child mortality rate is the worse in South America and, 
according to the 2001 census, 64% of the population did 
not have enough income to meet their basic needs [8]. 
The prevalence of soil-transmitted helminth infection is 
estimated at around 35% [9]. However, the geographical 
distribution and burden of soil-transmitted helminth in- 
fections is poorly documented. 

In the past 20 years, progress in geographical informa- 
tion system (GIS) and remote sensing techniques, coupled 
with spatial modelling, enabled a better understanding of 
helminth ecology and mapping at high spatial resolution 
[6,7,10-13]. Ecological niche and biology-driven models 
have been used in assessing the distribution of helminth 
infections [14-16]. Bayesian geostatistical models offer a 
robust methodology for identifying determinants of the 
disease distribution and for predicting infection risk and 
burden at high spatial scales [17]. These models have been 
widely used in assessing the relationship between hel- 
minth infection with demographic, environmental, and so- 
cioeconomic predictors, at sub-national [11,18], national 
[19], or regional scales [13,20,21]. In the Americas, high 
resolution, geostatistical, model-based risk estimates have 
been obtained for the whole continent [22] as well as for 
Brazil [23]. A key issue in geostatistical modelling is the 
selection of the predictors. Most of the variable selection 
methods in geostatistical applications rely on standard 
methods, such as stepwise regression or bivariate associa- 
tions that are appropriate for non-spatial data [10,11]. 
However, ignoring spatial correlation leads to incor- 
rect estimates of the statistical significance of the pre- 
dictors included in the model. Recently, Bayesian 
variable selection has been introduced in geostatistical 
disease mapping [21,24]. 

The purpose of this paper was to map the geographical 
distribution of A. lumbricoides, T. trichiura, and hook- 
worm in Bolivia, and to estimate the risk, number of 
infected school-aged children, and the costs related to 
treatment interventions in the country. Survey data were 
extracted from published and unpublished sources. 
Bayesian geostatistical models were employed using 
rigorous variable selection procedures. 

Methods 

Disease data 

Data on the prevalence of soil-transmitted helminth infec- 
tion were extracted from the global neglected tropical dis- 
eases (GNTD) database (www.gntd.org) [13,16,21,22,25]. 
The GNTD database is an open-access platform con- 
sisting of geo-referenced survey data pertaining to schi- 
stosomiasis, soil-transmitted helminthiasis, and other 
neglected tropical diseases. Surveys are identified through 
systematic searches of electronic databases such as 



PubMed and ISI Web of Knowledge with no restriction of 
publication date or language. Our search strategy, includ- 
ing data quality appraisal, is summarised in Table 1. 

Environmental, socioeconomic, and population data 

A total of 40 environmental and socioeconomic variables 
were considered in our analysis. Environmental variables 
included 19 interpolated climatic data from weather sta- 
tions related to temperature and precipitation, vegetation 
proxies such as the enhanced vegetation index (EVI) and 
normalized difference vegetation index (NDVI), altitude, 
land cover, as well as information on soil acidity and soil 
moisture. Various unsatisfactory basic needs (UBN) pov- 
erty indicators related to adequate housing material, insuf- 
ficient housing space, inadequate services of water and 
sewer systems and inadequate health attention were used 
as proxies of poverty. In addition, human development 
index (HDI) and infant mortality rate (IMR) were consid- 
ered as alternative poverty measures. Impact of direct hu- 
man influence on ecosystems was accounted by human 
influence index (HII). Population density and the propor- 
tion of school-aged children (age: 5-14 years), were used 
to estimate treatment needs and costs of intervention. 
Sources of the variables, together with their spatial and 
temporal resolution, are summarised in Table 2. 

For prediction purposes, a 5 x 5 km spatial resolution 
grid was created. Environmental data available at 1 x 1 km 
spatial resolution, were averaged over their closest neigh- 
bours. Soil acidity, soil moisture, and infant mortality rate 
were linked to the prediction pixel with the closest dis- 
tance. UBN and HDI were re-scaled by assigning to each 
grid pixel the value of the administrative unit they belong 
to. Re-scaling was performed in ArcMap version 10.0 
(Environmental Systems Research Institute; Redlands, 
CA, USA). 

Geostatistical model 

Disease survey data are typically binomially distributed 
and modelled via a logistic regression. More precisely, let 
Y if n b and p x be the number of infected individuals, the 
number of individuals screened, and the prevalence or risk 
of infection at location z, respectively, such as Yi~Bn( n i} p t ). 
Spatial correlation is taken into account by introducing 
location-specific parameters cp t that are considered as un- 
observed latent data from a stationary spatial Gaussian 
process. We modelled a temporal trend, the selected predic- 
tors (i.e. environmental and socioeconomic factors) X/ 
and cpi on the logit scale: logit(p t ) = Xf[3 + cp b The temporal 
trend was modelled by a binary variable T, indicating 
whether a survey was carried out before or from 1995 onwards. 

We assumed that cp ~ MVN ^0, Z^j with variance-covariance 
matrix Z. Geographical correlation was modelled by an 
isotropic exponential correlation function of distance, 
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i.e. Z c d = G^pexp(-pdcd)> where d cd is the Euclidean dis- 
tance between locations c and d, c% p is the geographical 
variability known as the partial sill, and p is a smoothing 
parameter controlling the rate of correlation decay. The 
geographic dependency (range) was defined as the mini- 
mum distance at which spatial correlation between lo- 
cations is less than 5% and is calculated by 31 p. To 
facilitate model fit, the model was formulated using a 
Bayesian framework of inference. Vague normal prior 
distributions /?~N(0, a 2 I) were adopted for the regres- 
sion coefficients, an inverse gamma distribution o 2 sp ~ 

IG^a a 2 p ,b a 2^j was chosen for the variance o 2 ^ and a 

gamma distribution was assumed for the spatial decay 
p, p ~ G(a p , b p ). 

Geostatistical variable selection 

Bayesian stochastic search variable selection [26] was 
performed to select the most important predictors 
among the 40 socioeconomic and environmental predic- 
tors, while taking into account the spatial correlation 
in the data. Predictors were either standardised or 
categorised if they presented a non-linear bivariate asso- 
ciation with the observed helminthiasis prevalence (on 
the logit scale). Furthermore, we considered a spike and 
slab prior distribution for the regression coefficients 
[27], which improves convergence properties of the 
Markov chain Monte Carlo (MCMC) simulation and al- 
lows selection of blocks of covariates such as categorical 
ones. In addition, we assessed correlation between the 
predictors and forced the model to choose only one (or 
none) predictor among those highly correlated (i.e. ab- 
solute value of Pearsons correlation coefficient larger 
than 0.9). The geostatistical variable selection explores 
all possible models and the final model is the one pre- 
senting the highest posterior probability. 

The geostatistical variable selection specification is 
summarised in Figure 1. In particular, predictors were 
classified into 19 groups b, (b = 1, 19), depending on 
their mutual correlations. Thirteen predictors that were 
only moderately correlated with any other predictors 
were separated into single variable groups. Highly corre- 
lated predictors were divided into six groups, each 
containing 38 variables Xj J b = 1, ...,/&. The regression 
coefficients are defined as the product of an overall con- 
tribution a )h of the predictor Xj b and the effect of 
each of its elements (i.e. categories), Xy b ,l = 1, cat- 
egories (excluding baseline) of the predictor Xj b . We 
assigned a spike and slab prior [27,28], which is a scaled 

normal mixture of inverse-gamma to aj b , that is 0Cj~N 

(o,i£), where T 2 ~y lb y Vh IG(a T ,b r ) + (l-y lb y 2j ^v 0 
IG(a T ,b T ). a T and b T are fixed parameters of non- 



informative inverse-gamma distribution, while v 0 is a small 
constant shrinking ccj h to zero when the predictor is ex- 
cluded. The presence or absence of the predictors is de- 
fined by the product of two indicators y lb and 

) , where y lb determines the pres- 
ence or absence of the group b in the model and 
y 2b jJ b = l,...J b allows selection of a single predictor 
within the group. A Bernoulli and a multinomial prior 
distribution are assigned to y lb and y 2 b> respectively, such 
as Yib ~ Bern(Oi) and y ~ Multi(l , 0 2b \ , . . . , &2bj b ) with 

inclusion probabilities 0 1 and Q 2b . To allow greater flexi- 
bility in estimating model size, these probabilities are con- 
sidered as hyper-parameters having non-informative beta 
and Dirichlet distributions. A mixture of two Gaussian dis- 
tributions is assumed for £i. b , £i jb ~ N^m lb , l^j , m ljb ~ 1/ 

26 1 {^i-^j + ' wmcn shrinks £i jb towards |1| 

(multiplicative identity). For predictors moderately corre- 
lated, y 2b - is fixed to 1, while the effect of linear predictors 
is only defined by an overall contribution of a. 

To complete model specification, the spatial random 
effect <£ is modelled as defined in the previous subsec- 
tion and a vague normal distribution is assigned to the 
constant term of the model. The subset of variables in- 
cluded in the models with the highest posterior prob- 
abilities identified the final models. 



Implementation details 

We considered the following values for the parameters 
of the prior distributions: o^=100, (a P) b p ) = (0.01,0.01), 

(a 0 2 bA = (2.01, 1.01), (a T A) = (5,25), (a m , b Q1 ) = 

(1,1), (a a2 bj = (!> X )' and u o = 0.00025. 

MCMC simulations were used to estimate model pa- 
rameters. For variable selection, a burn-in of 50,000 iter- 
ations was performed and another 50,000 iterations were 
run to identify the model with the highest posterior 
probability. For each infection, the best geostatistical 
model was fitted with one chain sampler and a burn-in 
of 5,000 iterations. Convergence was assessed after an 
average of 50,000 iterations using the Raftery and Lewis 
[29] diagnostics. A posterior sample of 1,000 values was 
used for validation purposes and for prediction at un- 
sampled locations. Prediction was carried out using 
Bayesian kriging [17] over a grid of 26,519 pixels of 5 x 5 
km spatial resolution. The median and standard devi- 
ation of the predicted posterior distribution were plotted 
to produce smooth risk maps together with their uncer- 
tainty. Analyses were implemented in WinBUGS 14 (Im- 
perial College and Medical Research Council; London, 
UK), while R version 2.7.2 (The R Foundation for 
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Table 1 Search strategy identification of for soil-transmitted helminth infection prevalence survey data in Bolivia 



Keywords 

Bolivi* AND helminth* (OR ascari*, OR trichur*, 
OR hookworm, OR necator, OR ankylostom*,OR 
ancylostom*,OR strongy*, OR hymenolepis*, 
OR toxocara*, OR enterobius*, OR 
geohelminth*, OR nematode) 



Exclusion criteria 

Hospital-based study; case-control study (except 
control group); clinical trials (except baseline); 
drug-efficacy study (except placebo group); 
displaced population (travellers, military, 
expatriates, nomads); population treated for the 
infection during the past year; unclear location 
of the survey; sample size <10 



Quality control measures 

Double check of each entry; search and 
elimination of duplicates; recalculation of 
prevalence; verification in Google Maps that 
point level coordinates correspond to human 
settlement 



Statistical Computing) was used for predictions. Non- 
spatial explorative statistical analyses were performed in 
Stata version 10.0 (Stata Corporation; College Station, 
USA). 

Model validation 

Models were fitted on a random training sample of 39 
locations for A lumbricoides and T. trichiura, and 37 lo- 
cations for hookworm. Model validation was performed 
on the remaining 10 test locations (around 20% of the 
total locations). The predictive performance was calcu- 
lated by the proportion of test locations being correctly 
predicted within the /c th Bayesian credible interval (BCI) 
of the posterior predictive distribution (limited by the 
lower and upper quantiles BCl\^ and BCI^ k y respect- 
ively), where k indicates the probability coverage of the 

interval as: ^ ]T min i 1 { BCI 'w < p) > 1 { BCI h >/>/))■ 

i=l 

The higher the number of test locations within the 



Table 2 Data sources and properties of the predictors explored to model soil-transmitted helminth infection risk in 
Bolivia 



Data type 


Source 


Date 


Temporal resolution 


Spatial resolution 


19 climatic variables related to temperature and precipitation 


WorldClim 1 


1950-2000 




1 km 


Altitude 


SRTM 2 


2000 




1 km 


Land cover 


MODIS^erra 3 


2000-201 1 


Yearly 


1 km 


EVI / NDVI 


MODIS^erra 3 


2000-201 1 


16 days 


1 km 


Soil acidity / soil moisture 


ISRIC-WISE 4 


1960-2000 




10 km 


Unsatisfactory basic needs (UBN) 


Census 5 


2001 


10 years 


Municipality 


Infant mortality rate (I MR) 


CIESIN 6 


2005 


Yearly 


5 km 


Human influence index (HII) 


LTW 7 


2005 




1 km 


Human development index (HDI) 


PAHO 8 


2005 




Municipality 


Population density 


WISE3 4 


2010 




10 km 


School-aged children proportion 


IDB 9 


2010 




Country 



1 WorldClim Global Climate database v.1.4; available at: http://www.worldclim.org/ (accessed: 1 March 2012). 

2 Shuttle Radar Topography Mission (SRTM); available at: http://www.worldclim.org/ (accessed: 1 March 2012). 

3 Moderate Resolution Imaging Spectroradiometer (MODIS); available at: https://lpdaac.usgs.gov/ (accessed: 15 December 2012). 

4 Global soil profile data ISRIC-WISE database v.1.2; available at: http://www.isric.org/ (accessed: 15 December 2012). 

5 Instituto nacional de estadistica, 2001 census; available at: http://www.ine.gob.bo/ (accessed: 1 March 2012). 

6 2005 Global subnational infant mortality rates, Center for International Earth Science Information Network (CIESIN). CIESIN, Palisades, NY, USA; available at: http:// 
www.ciesin.columbia.edu/povmap/ds_global.html (accessed: 1 March 2012). 

7 Last of the Wild Data Version 2, 2005 (LTW-2): Global Human Footprint Dataset (Geographic). Wildlife Conservation (WCS) and Center for International Earth 
Science Information Network (CIESIN); available at: http://www.ciesin.org/wildareas/ (accessed: 1 March 2012). 

8 Pan American Health Organization; personal communication. 

9 International Data Base (IDB) United States Census Bureau; available at: http://www.census.gov/population/international/ (accessed: 1 March 2012). 



narrowest and smallest coverage BCI, the better the model 
predictive ability. 

Treatment needs and estimated costs 

The number of infected school-aged children was calcu- 
lated for each pixel from the geostatistical model-based 
estimated risk and the population density. According 
to guidelines put forward by the World Health 
Organization (WHO), all school-aged children should be 
treated twice a year in high-risk communities (preva- 
lence of any soil- transmitted helminth infection >50%) 
and once every year in low-risk communities (prevalence 
of any soil-transmitted helminth infection between 20% 
and 50%). Large-scale preventive chemotherapy is not 
recommended in areas where prevalence is less than 
20%; indeed treatment should be delivered on a case- 
by-case basis in such areas [30]. We estimated the num- 
ber of albendazole or mebendazole treatments needed 
during one year in the school-aged population, consider- 
ing different units at which levels of risk were determined 
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log;7(p,) = Ao+>5 1 7; + 

» A » 




f = ((H,...,fiJ r ~MW(0,I) \ ^ — 



1=1,..., Allocations 



1 



y 6 =l,...,«/ 6 covariates of group 6 

y 2 bj h = 1 for groups with single 

moderately correlated covariate 



C^^~ *j h ~ Y\bY*, h IG ( a r Y\ b Yibj b WG(a T , b T ) <- 

f 



b — \,...,B groups 



~oinc*fe^r ^ <§^f^^i> 



/ = 1, . . . , L categories of covariate Xj 
£ lJh = 1 if X ■ is linear, i.e. / =1 



■(a T ,b r ),u 0 



Figure 1 Acyclic graph of the geostatistical variable selection. Stochastic and logical nodes are represented as ellipses. Dashed arrows are 
logical links and straight line arrows are stochastic dependencies. Fixed parameters of the prior distributions are highlighted in pink. 



(i.e. pixel, municipality, province, and department). Hence, 
we followed the same methodology as for estimating 
annualised praziquantel needs against schistosomiasis 
[31]. To calculate the cost of a school-based deworming 
programme in Bolivia, the estimated number of treat- 
ments was multiplied by an average unit cost equivalent 
to US$ 0.25, which includes additional expenses for train- 
ing, drug distribution, and administration [9,32]. 



Results 

Seven out of 59 identified peer-reviewed publications 
reported soil-transmitted helminth infection prevalence 
data in Bolivia [33-39]. For the current investigation, 
additional data were obtained from a 2006 report of the 
Ministry of Health (MoH) in Bolivia [40]. 

We obtained relevant prevalence data for A. lum- 
bricoides, T. trichiura, and hookworm for 49, 49, and 47 




B 





1960 1970 



1990 2000 



1970 1980 1990 2000 

Year 



1970 1980 1990 2000 

Year 



Figure 2 Frequency distribution of the survey periods in Bolivia for A. lumbricoides (A), T. trichiura (B), and hookworm (C). 
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Table 3 Variables selected by the geostatistical variable selection approach 



A. lumbricoides infection T. trichiura infection Hookworm infection 



Group 1 




















Home with indoor plumbing 1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


People with drinking water at home 1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Pipe network 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Population with high quality of life 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Population with UBN 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Population with sanitation at home 


0 


0 


0 


0 


X 


0 


0 


0 


0 


Group 2 




















Population with material UBN 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Population with low quality of life 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Group 3 




















Minimum temperature coldest month 1,2 


0 


0 


0 


0 


0 


0 


X 


0 


X 


Altitude 


0 


0 


0 


X 


0 


0 


0 


0 


0 


Annual temperature 


0 


0 


0 


0 


X 


0 


0 


0 


0 


Maximum temperature warmest month 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Temperature wettest quarter 


0 


0 


0 


0 


0 


0 


0 


X 


0 


Temperature driest quarter 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Temperature warmest quarter 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Temperature coldest quarter 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Group 4 




















Temperature annual range 3 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Temperature diurnal range 


0 


0 


0 


0 


0 


X 


0 


0 


0 


Group 5 




















Annual precipitation 1 ,2,3 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Precipitation wettest month 1,2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Precipitation wettest quarter 1,2 


X 


0 


0 


0 


0 


0 


0 


0 


0 


Precipitation driest month 2,3 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Precipitation driest quarter 2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Precipitation warmest quarter 3 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Precipitation coldest quarter 2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Group 6 




















Enhanced vegetation index 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Normalized difference vegetation index 


0 


0 


0 


0 


0 


0 


0 


0 


X 


Moderately correlated 




















Soil acidity 1,3 


0 


0 


X 


0 


0 


0 


0 


0 


0 


Precipitation seasonality 1,3 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Soil moisture 2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Isothermality 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Temperature seasonality 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Human influence index 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Infant mortality rate 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Human development index 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Population with education UBN 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Population with overcrowding UBN 


0 


0 


0 


0 


0 


0 


0 


0 


0 
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Table 3 Variables selected by the geostatistical variable selection approach (Continued) 



Population with sanitation UBN 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Population with light at home 


0 


0 


0 


0 


X 


0 


0 


0 


0 


Unemployment rate 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Posterior probability [%] 


42.2 


5.9 


2.9 


10.1 


6.0 


5.2 


10.2 


4.7 


2.0 



Categorised for A. lumbricoides; 2 categorised for 7". trichiura; 3 categorised for hookworm; X (selected), 0 (not selected). 

The best three models selected by the geostatistical variable selections are presented for each soil-transmitted helminth species, together with their 
posterior probabilities. 



survey locations, respectively, covering the period from 
1960 to 2010. The frequency distribution of the surveys, 
stratified by helminth species, is given in Figure 2. Six 
surveys out of 49 were reported at municipality level 
(administrative level 3) and were assigned to the centroid 
of their municipality. The remaining 43 locations were 
reported at school or village level and were therefore 



considered as point data. Most of the studies (71%) expli- 
citly screened school-aged children (the remaining stud- 
ies are either referring to entire populations or provide 
no information on the age range of the participants). 
With regard to the diagnosis of soil-transmitted 
helminthiasis, 47% of the studies used the WHO- 
recommended Kato-Katz technique [41], whereas in 21 



Table 4 Parameter estimates of non-spatial bivariate and Bayesian geostatistical logistic models with environmental 
and socio-economic predictors 



Bivariate non-spatial 



Geostatistical model 



OR 



95% Cl f 



OR + 



95% BCI + 



A. lumbricoides infection 

Survey period 

Before 1995 

1995 onwards 
Precipitation wettest quarter (mm) 

<350 

350-400 

>400 



Range (km) 

T. trichiura infection 

Survey period 
Before 1995 
1995 onwards 

Altitude 



Range (km) 
Hookworm infection 

Survey period 

Before 1995 

1995 onwards 
Minimum temperature coldest month 

°i P 

Range (km) 



1.00 
0.26 

1.00 
1.42 
12.25 



(0.24; 0.29) 



(1.23; 1.66) 
(10.95; 13.70)* 



1.00 
0.33 
0.33 



(0.29; 0.37) 
(0.31; 0.36)* 



1.00 
0.45 
6.25 



(0.41; 0.50) 
(5.81; 6.72)* 



1.00 
0.94 

1.00 
1.32 
12.52 
Median 

1.11 
9.2 



1.00 
0.85 
0.37 
Median 

1.29 
28.7 



I. 00 
0.72 

II. 35 
Median 

3.07 
128.4 



(0.64; 1 .42) 



(0.56; 2.81) 
(5.05; 25.56)* 
95% BCI + 

(0.72; 2.00) 
(1.3; 63.0) 



(0.55; 1 .30) 
(0.26; 0.56)* 
95% BCI + 

(0.77; 2.23) 
(3.2; 80.2) 



(0.1 2; 4.1 9) 
(5.00; 22.20) * 
95% BCI + 

(1.50; 7.44) 
(39.8; 387.5) 



+ OR: odds ratio; 95% CI: lower and upper bound of a 95% confidence interval; 95% BCI: lower and upper bound of a 95% Bayesian credible interval. 
^Significant based on 95% CI or 95% BCI. 
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Figure 3 Ascaris lumbricoides infection risk in Bolivia. The maps show the situation before 1995 (A) and from 1995 onwards (B), and provide 
estimates of the geographical distribution of the infection (1), the observed prevalence (2), and the coefficient of variation (3). 



locations the diagnostic approach was not stated, and in 
five locations other diagnostic techniques were utilised. 

Table 3 summarises, for each helminth species, the three 
best models resulting from the geostatistical variable se- 
lection. For A. lumbricoides, the model based on precipita- 
tion of the wettest quarter has the highest posterior 
probability of 42.2%. For T. trichiura the best model in- 
cluded altitude (posterior probability = 10.1%), while for 
hookworm, the model with the highest posterior probabil- 
ity (10.2%) included the minimum temperature during the 
coldest month. Results of the geostatistical logistic regres- 
sions, together with estimates of the bivariate non-spatial 
associations, are presented in Table 4. Precipitation of the 
wettest quarter above 400 mm had a positive effect on the 
odds of A. lumbricoides infection risk; hookworm infec- 
tion risk was positively associated to the minimum 
temperature during the coldest month, and the higher the 
altitude, the lower the odds of T. trichiura infection. Al- 
though the risk of infection with the three helminth spe- 
cies decreased after 1995, this effect was not important in 
the spatial models as reflected by the 95% BCI of the odds 
ratio estimates. Figures 3, 4, and 5 show the geographical 
distribution of the predicted risks for each of the three 
soil-transmitted helminth species before and after 1995, 



the corresponding standard deviation of the predictive dis- 
tribution and the raw survey data. Maps of all predictors 
involved in the final geostatistical models are shown in 
Figure 6. Bolivia presents generally a lower risk of soil- 
transmitted helminthiasis in the south-western part of the 
country, where high altitude brings unsuitable climatic 
conditions for the development of the parasites. For the 
three soil-transmitted helminth infections, the maps of 
the posterior standard deviation reflect the pattern of the 
predicted risk. However, we note that for hookworm, 
where the spatial correlation is more important (spatial 
range estimated to 128.4 km), the standard deviation was 
also low in areas surrounding the survey locations, 
suggesting less uncertainty in the estimation of the spatial 
random effect in the neighbourhood of observed data. 
Figure 7 shows that the risks of A, lumbricoides, 
T. trichiura and hookworm infection are correctly 
predicted within 95% BCIs for 90%, 90%, and 80%, 
respectively. 

Table 5 shows the total amount of treatment required on 
a yearly basis and the associated cost when the calculation 
is based on soil-transmitted helminth infection risk esti- 
mates, aggregated to various administrative levels. The 
estimated number of children targeted increases from 



Chammartin et al. Parasites & Vectors 2013, 6:152 
http://www.parasitesandvectors.eom/content/6/1/152 



Page 9 of 14 




Figure 4 Trichuris trichiura infection risk in Bolivia. The maps show the situation before 1995 (A) and from 1995 onwards (B), and provide 
estimates of the geographical distribution of the infection (1), the observed prevalence (2), and the coefficient of variation (3). 



1,481,605 to 2,180,101, depending on the administrative 
level at which the risk is aggregated. However, the number 
of treatments required remains quite stable, indicating 
large spatial heterogeneity of the infection risk within the 
units. Model-based predictions and estimates of number 
of school-aged children infected with the three soil-trans- 
mitted helminth species, aggregated at province and coun- 
try level, are presented in the Additional file 1. The 
estimated prevalence for A lumbricoides, T. trichiura, and 
hookworm infection is 38.0%, 19.3%, and 11.4%, respect- 
ively. Taking the three soil-transmitted helminth species 
together, we estimate that 48.4% of the school-aged popu- 
lation is infected with at least one species, assuming inde- 
pendence of the three soil-transmitted helminth infections. 
The highest number of school-aged children needing treat- 
ment is concentrated in the densely populated Andres 
Ibanez province, while the highest risk for the three soil- 
transmitted helminths taken together is predicted for the 
Vaca Diez province. 

Discussion 

We present spatially explicit estimates of the risk and 
number of school-aged children infected with the three 
common soil-transmitted helminths in Bolivia using a 



rigorous geostatistical variable selection approach. Survey 
data were extracted from the literature, geo-referenced, 
and made public via the open-access GNTD database. 
Our study also identified important data needs and gaps. 
For example, most of the surveys were conducted along 
the sub- Andean region. On the other hand, only few sur- 
vey locations were available in the less densely populated 
highlands and in the northern tropical areas. Rigorous 
geostatistical variable selection methods have been used to 
identify environmental and socioeconomic determinants 
that govern the distribution of soil-transmitted helminth 
infection in Bolivia. The country, nestled between the high 
Andean peaks (on the West) and the Amazon forest (on 
the East), presents specific ecological characteristics that 
shape helminth cycles in a complex way. High altitude 
and diverse topography, as well as the paucity of weather 
stations in remote areas can introduce interpolation bias 
in the climatic factors used in our analysis [42]. Bayesian 
variable selection helped in identifying the potential 
factors influencing the geographical distribution of the 
three common soil-transmitted helminth species. Our 
methodology enabled us to explore all possible models 
arising from 40 climatic and socioeconomic predictors, 
while accounting for spatial correlation in the data. 
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Figure 5 Hookworm infection risk in Bolivia. The maps show the situation before 1995 (A) and from 1995 onwards (B), and provide estimates 
of the geographical distribution of the infection (1), the observed prevalence (2), and the coefficient of variation (3). 



The parameterisation of the prior distribution of the 
regression coefficients as developed in this manuscript 
selects the best predictors among highly correlated ones, 
while addressing non-linearity. The selected predictors 
are plausible in terms of helminth biology, ecology, and 
epidemiology. Indeed, the distribution of A lumbricoides 
was positively associated with precipitation above 400 mm 
during the wettest month. High humidity is related with 
faster development of parasite eggs in the free environ- 
ment. Low humidity, on the other hand, can cease 
embryonation of A. lumbricoides [43,44]. The positive as- 
sociation between the minimum temperature of the 
coldest month and the prevalence of hookworm reflects 
inhibition of the development of the eggs by hostile cold 
temperatures [3,45] . The preventive effect of high altitude 
on T. trichiura infection risk has already been highlighted 
and explained by subsequent unfavourable temperature, 
which limits the transmission [46]. The three soil- 
transmitted helminth infection risks did not decrease 
significantly over time and we are unsure whether Bolivia 
has implemented integrated control measures. In the 
absence of preventive chemotherapy and/or sanitation im- 
provement, environmental contamination is considerable, 



which may explain our observations of fairly constant 
infection rates over time [47,48]. 

The transmission of soil-transmitted helminthiasis oc- 
curs via contaminated food or fingers (A lumbricoides 
and T. trichiura), or through the skin by walking on lar- 
vae-infested soil (hookworm). People living in poor con- 
ditions are more exposed due to their living conditions, 
the lack of access to clean water, sanitation, and health 
facilities [49]. Hence, we would have expected soil- 
transmitted helminth infections to be associated with 
some of the socioeconomic factors investigated, such as 
the ones related to sanitation [50]. However, none of the 
socioeconomic variables were picked up by our 
geostatistical variable selection approach. This may in- 
dicate that our socioeconomic proxies were not able to 
capture the socioeconomic disparities across the coun- 
try when aggregated at district or municipality scales. 
Historical data are aggregated over villages or larger 
areas and they are rarely available at household level. 
Often variation in socioeconomic status is larger within 
rather than between locations, and hence, it may be 
harder for socioeconomic data to explain geographical 
differences. 
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Bolivian soil also exhibits specific characteristics such 
as presence of salt and soil compactation arising from 
livestock farming, which may affect the transmission of 
soil- transmitted helminths. In our analysis, we explored 
different soil predictors, including land cover, the vegeta- 
tion indices EVI and NDVI, soil acidity and soil mois- 
ture. However, these factors failed to explain the 
distribution of the infection risks. 

The population of Bolivia is mainly concentrated in 
and around the three main cities La Paz, Santa Cruz, 
and Cochabamba, where large parts of the country are 
uninhabited. The absence of human hosts breaks para- 
site life cycles. Thus, although environmental conditions 
may be suitable for parasite survival, there is no risk of 
transmission. To avoid potential misinterpretation, we 
clearly delineate areas where no humans live. 




~i 1 1 1 1 r 1 

0 20 40 60 80 100 



Cl(%) 

Figure 7 Proportion of locations with observed prevalence 
falling within credible intervals of the posterior predictive 
distribution with probability coverage varying from 1% 
to 100%. 



The predicted risk maps for the three common 
soil-transmitted helminth species in Bolivia should be 
interpreted with caution, particularly for areas char- 
acterised by only sparse survey data or poor coverage. 
Sample design is not optimised regarding the surveyed 
population; 29% of the data did not report the survey 
type (school-aged, community-based) and might bias the 
raw prevalence, as it is widely acknowledged that school- 
aged children are at higher risk of soil-transmitted hel- 
minths, particularly A lumbricoides and T. trichiura, 
than their older counterparts [51]. Slightly less than half 
of the surveys stated the use of the WHO-recommended 
Kato-Katz technique for soil-transmitted helminth diag- 
nosis [41,52]. Heterogeneity in the data regarding the 
sensitivities and specificities of the diagnostic methods 
might introduce measurement errors in the raw preva- 
lence data. Furthermore, a zero hookworm prevalence 
was reported for 60% of the survey data. While these data 
suggest the non-endemicity of hookworm, the diagnostic 
approach might have underestimated the "true" preva- 
lence due to diagnostic dilemmas [53,54]. Indeed, single 
Kato-Katz thick smears, low intensity infections, and 
delays in stool processing compromise sensitivity, par- 
ticularly for hookworm diagnosis [55,56]. Giardina et al 
[24] developed a zero-inflated binomial geostatistical 



Table 5 Yearly estimation of school-aged children 
needing preventive chemotherapy against soil- 
transmitted helminthiasis in Bolivia 





5 x5 km 


Municipality 


Province 


Department 


Number of 


1,481,605 


1,749,136 


1,907,658 


2,180,101 


children targeted 










Number of 


2,894,936 


2,868,016 


2,847,604 


3,013,413 


treatment required 










Cost (US$) 


723,734 


717,003 


711,901 


753,353 



Estimates are based on prevalence predicted at pixels of 5 x 5 km resolution 
and aggregated over different administrative levels. 
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model to estimate malaria burden when data contain a 
high proportion of zeros. This model could be adopted 
for soil-transmitted helminth infection and implemented 
in Bolivia as soon as more survey data become available. 
In addition, data in the literature usually report on hook- 
worm prevalence, without differentiation of the species 
(A duodenale and N. americanus). It would be interest- 
ing to analyse the two species separately, as they may 
have different ecological preferences. 

Our study indicates that in Bolivia almost half (48.4%) 
of the population is infected with at least one of 
the three common soil-transmitted helminths. Our em- 
pirical-based estimates suggested that a total of 
2,868,016 annualised treatments are required for pre- 
ventive chemotherapy targeting school-aged children at 
the level of the municipalities. This estimate is higher 
than the one previously reported in the country 
(4,774,672 treatments for a 5-year campaign [9,32]). 
Population dynamic models [57-59] could be used to 
predict the effect of preventive chemotherapy on the 
epidemiological pattern of the three common soil-trans- 
mitted helminths, to evaluate the community effective- 
ness of the programme and to plan the duration of 
control interventions. 

Conclusions 

In the framework of a preventive chemotherapy strategy, 
reliable maps of the distribution of infection risk and dis- 
ease burden are needed to enhance cost-effectiveness of the 
interventions. Our high resolution estimates are based on 
existing data and their scarcity may raise doubts on the 
value of modelling of the disease distribution. However, 
soil-transmitted helminth infections are driven by environ- 
mental factors and, in the absence of interventions, the 
existing data can establish the relation between the risk of 
infection and climate. Hence, the risk maps produced are 
able to identify areas of high infection. Validation indicated 
that the models had good predictive ability. We therefore 
believe that the estimated maps can provide important in- 
puts in the sampling design of a national survey by indicat- 
ing the areas requiring more surveys. Hence, a coherent 
and optimally designed national survey is warranted to 
more accurately estimate the distribution and the number 
of people at risk of infection, so that preventive chemother- 
apy and other control measures can be optimally targeted. 

Additional file 



Additional file 1: Population-adjusted prevalence and estimated 
number of infected children (5-14 years old) with the three common 
soil-transmitted helminth (STH) infections, stratified by province and 
by country, for the period 1995 onwards, based on 2010 population 
estimates with 95% Bayesian credible interval (BCI). 
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